Academic Integrity Declaration

We’re part of an academic community at Warwick. Whether studying, teaching, or researching, we are all taking part in an expert conversation which must meet standards of academic integrity. When we all meet these standards, we can take pride in our own academic achievements, as individuals and as an academic community.

Academic integrity means committing to honesty in academic work, giving credit where we’ve used others’ ideas and being proud of our own achievements.

In submitting my work, I confirm that:

I have read the guidance on academic integrity provided in the Student Handbook and understand the University regulations in relation to Academic Integrity. I am aware of the potential consequences of Academic Misconduct. I declare that the work is all my own, except where I have stated otherwise. No substantial part(s) of the work submitted here has also been submitted by me in other credit bearing assessments courses of study (other than in certain cases of a re-submission of a piece of work), and I acknowledge that if this has been done this may lead to an appropriate sanction. Where a generative Artificial Intelligence such as ChatGPT has been used I confirm I have abided by both the University guidance and specific requirements as set out in the Student Handbook and the Assessment brief. I have clearly acknowledged the use of any generative Artificial Intelligence in my submission, my reasoning for using it and which generative AI (or AIs) I have used. Except where indicated the work is otherwise entirely my own. I understand that should this piece of work raise concerns requiring investigation in relation to any of points above, it is possible that other work I have submitted for assessment will be checked, even if marks (provisional or confirmed) have been published. Where a proof-reader, paid or unpaid was used, I confirm that the proof-reader was made aware of and has complied with the University’s proofreading policy.


Question 1

Data Dictionary

Variable Description
date date of the day the bikes were rented
hires number of bikes rented on the particular date
wfh policy to work from home. 1 indicates the policy was implemented on the particular date, 0 indicates the policy were not implemented on the particular date
rule_of_6_indoors policy to regulate only up to six people from any number of different households were allowed to meet outside. 1 indicates the policy was implemented on the particular date, 0 indicates the policy were not implemented on the particular date
eat_out_to_help_out policy to allow diners to receive a 50% discount on meals in restaurants. 1 indicates the policy was implemented on the particular date, 0 indicates the policy were not implemented on the particular date
day Day of the particular row
month Month of the particular row
year Year of the particular row

Master Data

# Load the dataset
bike_data <- read_csv("London_COVID_bikes.csv")
## Rows: 4812 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (2): day, month
## dbl  (12): Hires, schools_closed, pubs_closed, shops_closed, eating_places_c...
## date  (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Exploration

Check data structure

# Check Data Structure
str(bike_data)
## spc_tbl_ [4,812 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ date                           : Date[1:4812], format: "2010-07-30" "2010-07-31" ...
##  $ Hires                          : num [1:4812] 6897 5564 4303 6642 7966 ...
##  $ schools_closed                 : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
##  $ pubs_closed                    : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
##  $ shops_closed                   : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
##  $ eating_places_closed           : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
##  $ stay_at_home                   : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
##  $ household_mixing_indoors_banned: num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
##  $ wfh                            : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
##  $ rule_of_6_indoors              : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
##  $ curfew                         : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
##  $ eat_out_to_help_out            : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
##  $ day                            : chr [1:4812] "Fri" "Sat" "Sun" "Mon" ...
##  $ month                          : chr [1:4812] "Jul" "Jul" "Aug" "Aug" ...
##  $ year                           : num [1:4812] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   date = col_date(format = ""),
##   ..   Hires = col_double(),
##   ..   schools_closed = col_double(),
##   ..   pubs_closed = col_double(),
##   ..   shops_closed = col_double(),
##   ..   eating_places_closed = col_double(),
##   ..   stay_at_home = col_double(),
##   ..   household_mixing_indoors_banned = col_double(),
##   ..   wfh = col_double(),
##   ..   rule_of_6_indoors = col_double(),
##   ..   curfew = col_double(),
##   ..   eat_out_to_help_out = col_double(),
##   ..   day = col_character(),
##   ..   month = col_character(),
##   ..   year = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
head(bike_data)
## # A tibble: 6 × 15
##   date       Hires schools_closed pubs_closed shops_closed eating_places_closed
##   <date>     <dbl>          <dbl>       <dbl>        <dbl>                <dbl>
## 1 2010-07-30  6897              0           0            0                    0
## 2 2010-07-31  5564              0           0            0                    0
## 3 2010-08-01  4303              0           0            0                    0
## 4 2010-08-02  6642              0           0            0                    0
## 5 2010-08-03  7966              0           0            0                    0
## 6 2010-08-04  7893              0           0            0                    0
## # ℹ 9 more variables: stay_at_home <dbl>,
## #   household_mixing_indoors_banned <dbl>, wfh <dbl>, rule_of_6_indoors <dbl>,
## #   curfew <dbl>, eat_out_to_help_out <dbl>, day <chr>, month <chr>, year <dbl>
summary(bike_data)
##       date                Hires       schools_closed     pubs_closed     
##  Min.   :2010-07-30   Min.   :    0   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:2013-11-13   1st Qu.:19776   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :2017-02-28   Median :26356   Median :0.00000   Median :0.00000  
##  Mean   :2017-02-28   Mean   :26607   Mean   :0.02743   Mean   :0.05175  
##  3rd Qu.:2020-06-15   3rd Qu.:33481   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :2023-09-30   Max.   :73094   Max.   :1.00000   Max.   :1.00000  
##   shops_closed     eating_places_closed  stay_at_home    
##  Min.   :0.00000   Min.   :0.00000      Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.00000      1st Qu.:0.00000  
##  Median :0.00000   Median :0.00000      Median :0.00000  
##  Mean   :0.04634   Mean   :0.05175      Mean   :0.03616  
##  3rd Qu.:0.00000   3rd Qu.:0.00000      3rd Qu.:0.00000  
##  Max.   :1.00000   Max.   :1.00000      Max.   :1.00000  
##  household_mixing_indoors_banned      wfh         rule_of_6_indoors
##  Min.   :0.00000                 Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.00000                 1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :0.00000                 Median :0.0000   Median :0.00000  
##  Mean   :0.06525                 Mean   :0.2273   Mean   :0.01995  
##  3rd Qu.:0.00000                 3rd Qu.:0.0000   3rd Qu.:0.00000  
##  Max.   :1.00000                 Max.   :1.0000   Max.   :1.00000  
##      curfew        eat_out_to_help_out     day               month          
##  Min.   :0.00000   Min.   :0.000000    Length:4812        Length:4812       
##  1st Qu.:0.00000   1st Qu.:0.000000    Class :character   Class :character  
##  Median :0.00000   Median :0.000000    Mode  :character   Mode  :character  
##  Mean   :0.01164   Mean   :0.005819                                         
##  3rd Qu.:0.00000   3rd Qu.:0.000000                                         
##  Max.   :1.00000   Max.   :1.000000                                         
##       year     
##  Min.   :2010  
##  1st Qu.:2013  
##  Median :2017  
##  Mean   :2017  
##  3rd Qu.:2020  
##  Max.   :2023
# Check NA Values
bike_data %>% summarise_all(~ sum(is.na(.x)))
## # A tibble: 1 × 15
##    date Hires schools_closed pubs_closed shops_closed eating_places_closed
##   <int> <int>          <int>       <int>        <int>                <int>
## 1     0     0              0           0            0                    0
## # ℹ 9 more variables: stay_at_home <int>,
## #   household_mixing_indoors_banned <int>, wfh <int>, rule_of_6_indoors <int>,
## #   curfew <int>, eat_out_to_help_out <int>, day <int>, month <int>, year <int>
# There are no NAs
bike_data <- bike_data %>% mutate(day=factor(day, levels=c("Sun", "Mon", "Tue", "Wed", "Thu","Fri", "Sat")))

bike_data <- bike_data %>% mutate(month=factor(month, levels=c("Jan","Feb","Mar","Apr","May","Jun","Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))

bike_data <- bike_data %>% mutate(year=factor(year, levels=c("2010","2011","2012","2013","2014","2015","2016", "2017", "2018", "2019", "2020", "2021", "2022", "2023")))

bike_data <- bike_data %>% mutate(daynum=factor(as.numeric(day, levels=c("Sun", "Mon", "Tue", "Wed", "Thu","Fri", "Sat"))))

bike_data <- bike_data %>% mutate(monthnum=factor(as.numeric(month, levels=c("Jan","Feb","Mar","Apr","May","Jun","Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))))

#Split the data only after covid outbreak
bike_data_covid <- bike_data %>%
  filter(year %in% c(2020,2021,2022,2023))

Data Visualization

ggplot(bike_data_covid, aes(x = Hires)) +
  geom_histogram(binwidth = 1200, fill = "darkgreen", color = "black") +
  labs(x = "Number of Hired Bikes", y = "Count", title = "Distribution of Hired Bikes During COVID-19") +
  theme_minimal() +
  theme(plot.caption = element_text(hjust = 0.5))

Check Correlation Between Variables

# Checking the correlation between independent variables
correlation_matrix_bikes <- rcorr(as.matrix(select(bike_data_covid,wfh,rule_of_6_indoors,eat_out_to_help_out,daynum,monthnum,year)), type = "spearman")

print(correlation_matrix_bikes)
##                       wfh rule_of_6_indoors eat_out_to_help_out daynum monthnum
## wfh                  1.00              0.08               -0.29      0    -0.15
## rule_of_6_indoors    0.08              1.00               -0.04      0     0.09
## eat_out_to_help_out -0.29             -0.04                1.00      0     0.08
## daynum               0.00              0.00                0.00      1     0.00
## monthnum            -0.15              0.09                0.08      0     1.00
## year                 0.40             -0.19               -0.19      0    -0.12
##                      year
## wfh                  0.40
## rule_of_6_indoors   -0.19
## eat_out_to_help_out -0.19
## daynum               0.00
## monthnum            -0.12
## year                 1.00
## 
## n= 1370 
## 
## 
## P
##                     wfh    rule_of_6_indoors eat_out_to_help_out daynum
## wfh                        0.0027            0.0000              0.8982
## rule_of_6_indoors   0.0027                   0.1424              0.9880
## eat_out_to_help_out 0.0000 0.1424                                0.9938
## daynum              0.8982 0.9880            0.9938                    
## monthnum            0.0000 0.0009            0.0021              0.8623
## year                0.0000 0.0000            0.0000              0.9947
##                     monthnum year  
## wfh                 0.0000   0.0000
## rule_of_6_indoors   0.0009   0.0000
## eat_out_to_help_out 0.0021   0.0000
## daynum              0.8623   0.9947
## monthnum                     0.0000
## year                0.0000

Analysis 2: Further Analysis with model interactions

Linear Regression with model with Interaction

# Interaction models including time variables
interaction_model_wfh <- lm(Hires ~ wfh * year * month * day, data = bike_data_covid)
interaction_model_rule_of_6 <- lm(Hires ~ rule_of_6_indoors * year * month * day, data = bike_data_covid)
interaction_model_eat_out <- lm(Hires ~ eat_out_to_help_out * year * month * day, data = bike_data_covid)

# Function to extract and display only significant coefficients
display_significant_coefs <- function(model) {
  coefs <- summary(model)$coefficients
  # Filter coefficients with p-value less than 0.05 (significant at the 5% level)
  significant_coefs <- coefs[coefs[, 4] < 0.05, ]
  return(significant_coefs)
}

# Extract and display only the significant coefficients from each interaction model
list(
  wfh = display_significant_coefs(interaction_model_wfh),
  rule_of_6 = display_significant_coefs(interaction_model_rule_of_6),
  eat_out = display_significant_coefs(interaction_model_eat_out)
)
## $wfh
##                           Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)               15064.00   3155.150  4.774416 2.063705e-06
## wfh                       13319.30   6150.520  2.165557 3.057469e-02
## year2023                 -15775.90   5986.476 -2.635256 8.534021e-03
## monthMay                  15789.70   5986.476  2.637562 8.476630e-03
## monthJun                  17728.45   6150.520  2.882431 4.028425e-03
## monthAug                  23736.00   4233.078  5.607268 2.641179e-08
## monthSep                  29916.00   4819.571  6.207191 7.815248e-10
## monthOct                  22918.45   6414.609  3.572852 3.695046e-04
## monthNov                  15879.00   6567.968  2.417643 1.579467e-02
## dayMon                     9773.00   4462.056  2.190246 2.873066e-02
## dayTue                    11842.00   4462.056  2.653934 8.078872e-03
## dayWed                     8656.80   4233.078  2.045037 4.110569e-02
## dayThu                    10536.80   4233.078  2.489158 1.296201e-02
## dayFri                     9714.60   4233.078  2.294926 2.193831e-02
## wfh:monthAug             -15506.65   5986.476 -2.590280 9.725614e-03
## wfh:monthSep             -31605.30   9535.307 -3.314555 9.499741e-04
## wfh:monthOct             -25825.75   8886.850 -2.906063 3.738654e-03
## year2021:monthMar         28369.95   8105.727  3.499988 4.852487e-04
## year2022:monthMar         20865.75   7055.130  2.957529 3.172163e-03
## year2023:monthMar         18277.15   8105.727  2.254844 2.435308e-02
## year2021:monthApr         15094.95   7466.448  2.021704 4.346583e-02
## monthAug:dayMon          -12482.80   5986.476 -2.085167 3.730067e-02
## monthSep:dayMon          -20299.33   6815.903 -2.978231 2.967278e-03
## monthApr:dayTue          -18479.00   8698.148 -2.124475 3.386880e-02
## monthAug:dayTue          -12773.25   6150.520 -2.076776 3.807044e-02
## monthSep:dayTue          -17232.33   6815.903 -2.528254 1.161207e-02
## monthApr:dayWed          -20131.70   8582.936 -2.345549 1.918856e-02
## monthSep:dayWed          -15440.47   6668.249 -2.315521 2.078075e-02
## monthAug:dayThu          -13803.05   5986.476 -2.305705 2.132561e-02
## monthSep:dayThu          -17804.13   6668.249 -2.669986 7.705197e-03
## monthApr:dayFri          -17669.55   8698.148 -2.031415 4.246999e-02
## monthAug:dayFri          -14283.85   5986.476 -2.386020 1.721220e-02
## monthSep:dayFri          -15875.93   6668.249 -2.380825 1.745545e-02
## wfh:monthSep:dayMon       30802.73  13484.960  2.284229 2.256144e-02
## wfh:monthSep:dayTue       32342.58  12725.337  2.541590 1.118104e-02
## wfh:monthSep:dayThu       27267.03  13336.497  2.044542 4.115455e-02
## year2022:monthApr:dayTue  19739.30   8811.854  2.240085 2.529868e-02
## year2022:monthJun:dayTue  19008.70   8698.148  2.185373 2.908682e-02
## year2022:monthApr:dayWed  18329.00   8698.148  2.107230 3.533960e-02
## year2021:monthJun:dayWed  25868.05  10559.151  2.449823 1.445873e-02
## year2022:monthJun:dayWed  22260.90   8698.148  2.559269 1.063158e-02
## year2022:monthJun:dayFri  18693.10   8811.854  2.121358 3.413069e-02
## year2022:monthNov:dayFri  17447.25   8698.148  2.005858 4.513330e-02
## 
## $rule_of_6
##                                    Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)                        15064.00   3206.120  4.698514 2.976735e-06
## monthApr                           19324.50   4534.139  4.262000 2.212390e-05
## monthMay                           29109.00   4301.462  6.767234 2.202309e-11
## monthJun                           31047.75   4534.139  6.847552 1.291149e-11
## monthJul                           26207.00   4534.139  5.779929 9.905247e-09
## monthAug                           23736.00   4301.462  5.518124 4.337319e-08
## monthSep                           28488.50   5553.163  5.130139 3.460221e-07
## monthOct                           14437.00   5553.163  2.599780 9.462232e-03
## monthNov                           13346.40   4301.462  3.102759 1.969644e-03
## dayMon                              9773.00   4534.139  2.155426 3.136033e-02
## dayTue                             11842.00   4534.139  2.611742 9.139636e-03
## dayWed                              8656.80   4301.462  2.012525 4.442527e-02
## dayThu                             10536.80   4301.462  2.449586 1.446819e-02
## dayFri                              9714.60   4301.462  2.258442 2.412729e-02
## rule_of_6_indoors:year2021         29452.67   9794.859  3.006952 2.703035e-03
## rule_of_6_indoors:monthJun        -30333.27   9689.346 -3.130579 1.793894e-03
## year2021:monthMar                  13120.85   6083.185  2.156904 3.124460e-02
## year2023:monthApr                 -13439.50   6083.185 -2.209287 2.737480e-02
## year2021:monthMay                 -16665.93   6358.581 -2.621015 8.896372e-03
## year2022:monthMay                 -15645.40   5911.794 -2.646473 8.258014e-03
## year2023:monthMay                 -16880.40   6083.185 -2.774928 5.621594e-03
## year2022:monthJun                 -19145.55   6249.879 -3.063347 2.245802e-03
## year2023:monthJun                 -19684.65   6249.879 -3.149605 1.682142e-03
## year2021:monthJul                 -22392.60   8360.539 -2.678368 7.516322e-03
## year2023:monthJul                 -17511.80   6083.185 -2.878722 4.075717e-03
## year2023:monthAug                 -15506.65   6083.185 -2.549100 1.094460e-02
## year2022:monthSep                 -31741.05   7024.258 -4.518777 6.941925e-06
## year2023:monthSep                 -19097.40   7024.258 -2.718778 6.662778e-03
## year2022:monthOct                 -15913.80   6876.362 -2.314276 2.084915e-02
## year2022:monthNov                 -20327.45   6083.185 -3.341580 8.630496e-04
## year2022:monthDec                 -13319.30   6249.879 -2.131129 3.331546e-02
## year2021:dayMon                   -12834.60   6249.879 -2.053576 4.026956e-02
## year2021:dayTue                   -15730.10   6249.879 -2.516865 1.199181e-02
## year2021:dayWed                   -13123.90   6083.185 -2.157406 3.120543e-02
## year2021:dayThu                   -13619.90   6083.185 -2.238942 2.537319e-02
## monthApr:dayMon                   -28352.25   6412.241 -4.421582 1.084084e-05
## monthMay:dayMon                   -21863.50   6249.879 -3.498228 4.884252e-04
## monthJun:dayMon                   -19731.35   6249.879 -3.157077 1.640034e-03
## monthJul:dayMon                   -18006.00   6412.241 -2.808067 5.078414e-03
## monthAug:dayMon                   -12482.80   6083.185 -2.052017 4.042111e-02
## monthSep:dayMon                   -20371.50   9068.278 -2.246457 2.488655e-02
## monthNov:dayMon                   -16653.60   6083.185 -2.737645 6.295020e-03
## monthApr:dayTue                   -32976.75   6412.241 -5.142781 3.240677e-07
## monthMay:dayTue                   -27881.50   6249.879 -4.461126 9.052109e-06
## monthJun:dayTue                   -18558.15   6249.879 -2.969362 3.053530e-03
## monthJul:dayTue                   -16647.75   6412.241 -2.596245 9.559466e-03
## monthAug:dayTue                   -12773.25   6249.879 -2.043760 4.123201e-02
## monthSep:dayTue                   -17806.00   7853.359 -2.267310 2.357844e-02
## monthNov:dayTue                   -15307.40   6249.879 -2.449231 1.448235e-02
## monthApr:dayWed                   -28340.10   6083.185 -4.658760 3.598865e-06
## monthMay:dayWed                   -22482.30   6083.185 -3.695810 2.307640e-04
## monthJun:dayWed                   -23639.30   6249.879 -3.782361 1.643163e-04
## monthJul:dayWed                   -15798.20   6083.185 -2.597028 9.537872e-03
## monthSep:dayWed                   -16592.30   7721.360 -2.148883 3.187687e-02
## monthOct:dayWed                   -15363.30   7721.360 -1.989714 4.688728e-02
## monthApr:dayThu                   -29526.30   6083.185 -4.853756 1.399410e-06
## monthMay:dayThu                   -23530.05   6083.185 -3.868047 1.166161e-04
## monthJun:dayThu                   -24019.30   6249.879 -3.843162 1.289160e-04
## monthJul:dayThu                   -16849.20   6083.185 -2.769799 5.710200e-03
## monthAug:dayThu                   -13803.05   6083.185 -2.269050 2.347207e-02
## monthSep:dayThu                   -18536.80   7721.360 -2.400717 1.654004e-02
## monthNov:dayThu                   -16503.20   6083.185 -2.712921 6.780836e-03
## monthApr:dayFri                   -25850.60   6249.879 -4.136176 3.820034e-05
## monthMay:dayFri                   -20046.60   5911.794 -3.390951 7.230303e-04
## monthJun:dayFri                   -20042.35   6249.879 -3.206838 1.383480e-03
## monthAug:dayFri                   -14283.85   6083.185 -2.348087 1.905903e-02
## monthSep:dayFri                   -16092.10   7721.360 -2.084102 3.739761e-02
## monthNov:dayFri                   -15748.00   6083.185 -2.588775 9.767933e-03
## rule_of_6_indoors:year2021:dayMon -32115.67  13348.143 -2.406003 1.630402e-02
## rule_of_6_indoors:year2021:dayTue -34968.17  13348.143 -2.619703 8.930435e-03
## rule_of_6_indoors:year2021:dayWed -34230.67  13348.143 -2.564452 1.047512e-02
## rule_of_6_indoors:year2021:dayThu -38310.33  12824.481 -2.987281 2.881567e-03
## rule_of_6_indoors:monthMay:dayMon  30735.83  12824.481  2.396653 1.672354e-02
## rule_of_6_indoors:monthJun:dayMon  47893.62  13193.227  3.630167 2.972073e-04
## rule_of_6_indoors:monthJun:dayTue  53764.77  13115.083  4.099461 4.467966e-05
## rule_of_6_indoors:monthJun:dayWed  51143.62  13115.083  3.899603 1.026092e-04
## rule_of_6_indoors:monthJun:dayThu  56432.40  12930.910  4.364148 1.405122e-05
## rule_of_6_indoors:monthJun:dayFri  31180.53  12851.171  2.426280 1.542584e-02
## year2021:monthApr:dayMon           23106.60   8954.207  2.580530 1.000273e-02
## year2022:monthApr:dayMon           24330.15   8838.664  2.752696 6.014864e-03
## year2023:monthApr:dayMon           22528.95   8721.590  2.583124 9.928322e-03
## year2021:monthMay:dayMon           20981.93   9582.671  2.189570 2.877981e-02
## year2022:monthMay:dayMon           19855.10   8482.597  2.340687 1.943892e-02
## year2022:monthJun:dayMon           25240.00   8721.590  2.893968 3.884510e-03
## year2023:monthJun:dayMon           18180.95   8721.590  2.084591 3.735302e-02
## year2021:monthJul:dayMon           39076.10  11013.385  3.548055 4.056289e-04
## year2022:monthJul:dayMon           17515.15   8721.590  2.008252 4.487798e-02
## year2022:monthSep:dayMon           25829.15  10919.652  2.365382 1.819649e-02
## year2021:monthNov:dayMon           23339.05   8602.923  2.712921 6.780834e-03
## year2022:monthNov:dayMon           23185.50   8602.923  2.695072 7.152254e-03
## year2021:monthDec:dayMon           19586.85   8838.664  2.216042 2.690722e-02
## year2021:monthApr:dayTue           30099.60   8954.207  3.361504 8.037591e-04
## year2022:monthApr:dayTue           34237.05   8954.207  3.823572 1.394500e-04
## year2023:monthApr:dayTue           31581.45   8721.590  3.621066 3.077241e-04
## year2021:monthMay:dayTue           30654.93   9582.671  3.198997 1.421280e-03
## year2022:monthMay:dayTue           28903.20   8602.923  3.359695 8.089811e-04
## year2023:monthMay:dayTue           24224.90   8602.923  2.815892 4.957300e-03
## year2022:monthJun:dayTue           33506.45   8838.664  3.790895 1.588466e-04
## year2023:monthJun:dayTue           18172.00   8721.590  2.083565 3.744656e-02
## year2021:monthJul:dayTue           39778.85  11013.385  3.611864 3.187107e-04
## year2022:monthJul:dayTue           18737.30   8838.664  2.119925 3.425173e-02
## year2023:monthJul:dayTue           17121.25   8721.590  1.963088 4.990550e-02
## year2021:monthAug:dayTue           20266.35   8602.923  2.355752 1.867247e-02
## year2022:monthSep:dayTue           26528.80  10036.744  2.643168 8.338493e-03
## year2021:monthOct:dayTue           21629.90   9933.800  2.177404 2.967747e-02
## year2022:monthOct:dayTue           25097.30   9933.800  2.526455 1.167133e-02
## year2021:monthNov:dayTue           26574.35   8721.590  3.046962 2.370738e-03
## year2022:monthNov:dayTue           24890.75   8721.590  2.853923 4.405076e-03
## year2021:monthDec:dayTue           17859.55   8838.664  2.020617 4.357862e-02
## year2021:monthApr:dayWed           23372.95   8721.590  2.679896 7.482353e-03
## year2022:monthApr:dayWed           26537.40   8721.590  3.042725 2.404061e-03
## year2023:monthApr:dayWed           26272.85   8602.923  3.053944 2.316737e-03
## year2021:monthMay:dayWed           26631.73   9474.795  2.810798 5.035842e-03
## year2022:monthMay:dayWed           20185.45   8602.923  2.346348 1.914773e-02
## year2023:monthMay:dayWed           19319.55   8602.923  2.245696 2.493549e-02
## year2022:monthJun:dayWed           30469.30   8721.590  3.493549 4.969617e-04
## year2023:monthJun:dayWed           25054.45   8838.664  2.834642 4.677649e-03
## year2021:monthJul:dayWed           36792.30  10825.108  3.398793 7.028396e-04
## year2021:monthAug:dayWed           23309.90   8602.923  2.709532 6.849980e-03
## year2021:monthSep:dayWed           22702.85   9829.779  2.309599 2.110798e-02
## year2022:monthSep:dayWed           25928.10   9933.800  2.610089 9.183624e-03
## year2021:monthOct:dayWed           25122.20   9829.779  2.555724 1.073978e-02
## year2022:monthOct:dayWed           20911.85   9829.779  2.127398 3.362480e-02
## year2021:monthNov:dayWed           21709.55   8721.590  2.489173 1.296145e-02
## year2022:monthNov:dayWed           17379.95   8602.923  2.020238 4.361795e-02
## year2021:monthDec:dayWed           19020.25   8602.923  2.210905 2.726211e-02
## year2022:monthMar:dayThu           17375.45   8602.923  2.019715 4.367231e-02
## year2021:monthApr:dayThu           26236.45   8602.923  3.049713 2.349321e-03
## year2022:monthApr:dayThu           29811.10   8721.590  3.418081 6.553920e-04
## year2023:monthApr:dayThu           24640.80   8602.923  2.864236 4.265287e-03
## year2021:monthMay:dayThu           24994.48   9474.795  2.637997 8.465824e-03
## year2022:monthMay:dayThu           26721.70   8602.923  3.106119 1.947612e-03
## year2023:monthMay:dayThu           20261.95   8721.590  2.323195 2.036326e-02
## year2022:monthJun:dayThu           30472.90   8721.590  3.493962 4.962031e-04
## year2023:monthJun:dayThu           22893.35   8721.590  2.624906 8.796036e-03
## year2021:monthJul:dayThu           45121.30  10825.108  4.168208 3.328675e-05
## year2022:monthJul:dayThu           18554.25   8602.923  2.156738 3.125762e-02
## year2023:monthJul:dayThu           17216.75   8602.923  2.001267 4.562633e-02
## year2021:monthAug:dayThu           19868.15   8602.923  2.309465 2.111545e-02
## year2021:monthSep:dayThu           21123.35   9829.779  2.148914 3.187441e-02
## year2022:monthSep:dayThu           27357.95   9829.779  2.783171 5.481793e-03
## year2021:monthOct:dayThu           21539.95   9829.779  2.191296 2.865445e-02
## year2022:monthOct:dayThu           19646.85   9829.779  1.998707 4.590325e-02
## year2021:monthNov:dayThu           27693.80   8721.590  3.175315 1.541301e-03
## year2022:monthNov:dayThu           27214.50   8721.590  3.120360 1.856705e-03
## year2021:monthDec:dayThu           20720.25   8602.923  2.408513 1.619297e-02
## year2021:monthApr:dayFri           24500.05   8602.923  2.847875 4.488989e-03
## year2022:monthApr:dayFri           23176.85   8721.590  2.657411 7.996573e-03
## year2023:monthApr:dayFri           20370.85   8721.590  2.335681 1.969962e-02
## year2021:monthMay:dayFri           25442.03   9255.272  2.748923 6.084002e-03
## year2022:monthMay:dayFri           21365.00   8482.597  2.518686 1.193035e-02
## year2022:monthJun:dayFri           26874.15   8838.664  3.040522 2.421559e-03
## year2023:monthJun:dayFri           17485.20   8721.590  2.004818 4.524461e-02
## year2021:monthJul:dayFri           27986.00  10729.731  2.608267 9.232322e-03
## year2021:monthAug:dayFri           17570.20   8482.597  2.071323 3.857787e-02
## year2022:monthAug:dayFri           19942.15   8721.590  2.286527 2.242629e-02
## year2022:monthSep:dayFri           21822.50   9829.779  2.220040 2.663377e-02
## year2021:monthNov:dayFri           23079.60   8602.923  2.682762 7.418967e-03
## year2022:monthNov:dayFri           25628.30   8721.590  2.938489 3.371907e-03
## 
## $eat_out
##                           Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)               15064.00   3210.781  4.691693 3.066454e-06
## monthApr                  19324.50   4540.731  4.255813 2.268708e-05
## monthMay                  29109.00   4307.715  6.757410 2.322287e-11
## monthJun                  31047.75   4540.731  6.837611 1.362723e-11
## monthJul                  26207.00   4540.731  5.771538 1.032751e-08
## monthAug                  29033.00   7179.526  4.043861 5.641609e-05
## monthSep                  25344.50   4540.731  5.581591 3.031570e-08
## monthOct                  10412.00   4540.731  2.293023 2.204326e-02
## monthNov                  13346.40   4307.715  3.098255 1.998279e-03
## dayMon                     9773.00   4540.731  2.152297 3.160093e-02
## dayTue                    11842.00   4540.731  2.607950 9.237654e-03
## dayWed                     8656.80   4307.715  2.009604 4.472816e-02
## dayThu                    10536.80   4307.715  2.446030 1.460686e-02
## dayFri                     9714.60   4307.715  2.255163 2.432799e-02
## year2021:monthMar         13120.85   6092.029  2.153773 3.148460e-02
## year2023:monthApr        -13439.50   6092.029 -2.206079 2.759399e-02
## year2021:monthMay        -13702.80   5920.388 -2.314510 2.083159e-02
## year2022:monthMay        -15645.40   5920.388 -2.642631 8.348660e-03
## year2023:monthMay        -16880.40   6092.029 -2.770899 5.688676e-03
## year2022:monthJun        -19145.55   6258.965 -3.058900 2.277721e-03
## year2023:monthJun        -19684.65   6258.965 -3.145033 1.707255e-03
## year2023:monthJul        -17511.80   6092.029 -2.874543 4.127611e-03
## year2023:monthAug        -20803.65   8372.694 -2.484702 1.312060e-02
## year2022:monthSep        -28597.05   6258.965 -4.568974 5.481290e-06
## year2023:monthSep        -15953.40   6258.965 -2.548888 1.094778e-02
## year2022:monthNov        -20327.45   6092.029 -3.336729 8.773584e-04
## year2022:monthDec        -13319.30   6258.965 -2.128035 3.356612e-02
## year2021:dayMon          -12834.60   6258.965 -2.050595 4.055380e-02
## year2021:dayTue          -15730.10   6258.965 -2.513211 1.211233e-02
## year2021:dayWed          -13123.90   6092.029 -2.154274 3.144523e-02
## year2021:dayThu          -13619.90   6092.029 -2.235692 2.558111e-02
## monthApr:dayMon          -28352.25   6421.563 -4.415163 1.113503e-05
## monthMay:dayMon          -21863.50   6258.965 -3.493149 4.972075e-04
## monthJun:dayMon          -19731.35   6258.965 -3.152494 1.664621e-03
## monthJul:dayMon          -18006.00   6421.563 -2.803990 5.140297e-03
## monthSep:dayMon          -16054.75   6421.563 -2.500131 1.256605e-02
## monthNov:dayMon          -16653.60   6092.029 -2.733670 6.368366e-03
## monthApr:dayTue          -32976.75   6421.563 -5.135315 3.354582e-07
## monthMay:dayTue          -27881.50   6258.965 -4.454650 9.301584e-06
## monthJun:dayTue          -18558.15   6258.965 -2.965051 3.094600e-03
## monthJul:dayTue          -16647.75   6421.563 -2.592476 9.660920e-03
## monthSep:dayTue          -12837.30   6258.965 -2.051026 4.051169e-02
## monthNov:dayTue          -15307.40   6258.965 -2.445676 1.462112e-02
## monthApr:dayWed          -28340.10   6092.029 -4.651997 3.705762e-06
## monthMay:dayWed          -22482.30   6092.029 -3.690445 2.353367e-04
## monthJun:dayWed          -23639.30   6258.965 -3.776870 1.677081e-04
## monthJul:dayWed          -15798.20   6092.029 -2.593257 9.639150e-03
## monthSep:dayWed          -14561.30   6092.029 -2.390222 1.701336e-02
## monthApr:dayThu          -29526.30   6092.029 -4.846710 1.444003e-06
## monthMay:dayThu          -23530.05   6092.029 -3.862432 1.191203e-04
## monthJun:dayThu          -24019.30   6258.965 -3.837583 1.316530e-04
## monthJul:dayThu          -16849.20   6092.029 -2.765778 5.778117e-03
## monthSep:dayThu          -14411.05   6258.965 -2.302465 2.150345e-02
## monthNov:dayThu          -16503.20   6092.029 -2.708982 6.858589e-03
## monthApr:dayFri          -25850.60   6258.965 -4.130171 3.912313e-05
## monthMay:dayFri          -20046.60   5920.388 -3.386028 7.353333e-04
## monthJun:dayFri          -20042.35   6258.965 -3.202183 1.404804e-03
## monthSep:dayFri          -13300.85   6258.965 -2.125088 3.381187e-02
## monthNov:dayFri          -15748.00   6092.029 -2.585017 9.871076e-03
## year2021:monthApr:dayMon  23106.60   8967.225  2.576784 1.010776e-02
## year2022:monthApr:dayMon  24330.15   8851.514  2.748699 6.085627e-03
## year2023:monthApr:dayMon  22528.95   8734.270  2.579374 1.003276e-02
## year2021:monthMay:dayMon  22942.70   8615.431  2.662978 7.863506e-03
## year2022:monthMay:dayMon  19855.10   8494.929  2.337288 1.961101e-02
## year2021:monthJun:dayMon  17956.45   8851.514  2.028630 4.274744e-02
## year2022:monthJun:dayMon  25240.00   8734.270  2.889766 3.934432e-03
## year2023:monthJun:dayMon  18180.95   8734.270  2.081565 3.762356e-02
## year2021:monthJul:dayMon  18756.85   8967.225  2.091712 3.670362e-02
## year2022:monthJul:dayMon  17515.15   8734.270  2.005336 4.518285e-02
## year2022:monthSep:dayMon  21512.40   8851.514  2.430364 1.525002e-02
## year2021:monthNov:dayMon  23339.05   8615.431  2.708982 6.858587e-03
## year2022:monthNov:dayMon  23185.50   8615.431  2.691160 7.233318e-03
## year2021:monthDec:dayMon  19586.85   8851.514  2.212825 2.712380e-02
## year2021:monthApr:dayTue  30099.60   8967.225  3.356624 8.172260e-04
## year2022:monthApr:dayTue  34237.05   8967.225  3.818021 1.423841e-04
## year2023:monthApr:dayTue  31581.45   8734.270  3.615809 3.136056e-04
## year2021:monthMay:dayTue  28709.55   8734.270  3.287001 1.046087e-03
## year2022:monthMay:dayTue  28903.20   8615.431  3.354818 8.225226e-04
## year2023:monthMay:dayTue  24224.90   8615.431  2.811804 5.018005e-03
## year2021:monthJun:dayTue  25650.10   8734.270  2.936719 3.389279e-03
## year2022:monthJun:dayTue  33506.45   8851.514  3.785392 1.621386e-04
## year2023:monthJun:dayTue  18172.00   8734.270  2.080540 3.771755e-02
## year2021:monthJul:dayTue  20370.85   8967.225  2.271701 2.330586e-02
## year2022:monthJul:dayTue  18737.30   8851.514  2.116847 3.450709e-02
## year2022:monthSep:dayTue  21560.10   8851.514  2.435753 1.502601e-02
## year2021:monthOct:dayTue  18203.15   8851.514  2.056501 3.998043e-02
## year2022:monthOct:dayTue  21670.55   8851.514  2.448231 1.451843e-02
## year2021:monthNov:dayTue  26574.35   8734.270  3.042538 2.404114e-03
## year2022:monthNov:dayTue  24890.75   8734.270  2.849780 4.460312e-03
## year2021:monthDec:dayTue  17859.55   8851.514  2.017683 4.387779e-02
## year2021:monthApr:dayWed  23372.95   8734.270  2.676005 7.566319e-03
## year2022:monthApr:dayWed  26537.40   8734.270  3.038308 2.437823e-03
## year2023:monthApr:dayWed  26272.85   8615.431  3.049511 2.349485e-03
## year2021:monthMay:dayWed  25537.10   8615.431  2.964112 3.103979e-03
## year2022:monthMay:dayWed  20185.45   8615.431  2.342941 1.931795e-02
## year2023:monthMay:dayWed  19319.55   8615.431  2.242436 2.514089e-02
## year2021:monthJun:dayWed  34076.45   8734.270  3.901465 1.016834e-04
## year2022:monthJun:dayWed  30469.30   8734.270  3.488477 5.058763e-04
## year2023:monthJun:dayWed  25054.45   8851.514  2.830527 4.735604e-03
## year2021:monthJul:dayWed  22908.05   8734.270  2.622778 8.847697e-03
## year2021:monthSep:dayWed  20671.85   8615.431  2.399398 1.659517e-02
## year2022:monthSep:dayWed  23897.10   8734.270  2.736016 6.323476e-03
## year2021:monthNov:dayWed  21709.55   8734.270  2.485560 1.308921e-02
## year2022:monthNov:dayWed  17379.95   8615.431  2.017305 4.391730e-02
## year2021:monthDec:dayWed  19020.25   8615.431  2.207696 2.748068e-02
## year2022:monthMar:dayThu  17375.45   8615.431  2.016783 4.397190e-02
## year2021:monthApr:dayThu  26236.45   8615.431  3.045286 2.382449e-03
## year2022:monthApr:dayThu  29811.10   8734.270  3.413119 6.667026e-04
## year2023:monthApr:dayThu  24640.80   8615.431  2.860078 4.319112e-03
## year2021:monthMay:dayThu  25310.35   8615.431  2.937793 3.377664e-03
## year2022:monthMay:dayThu  26721.70   8615.431  3.101609 1.975981e-03
## year2023:monthMay:dayThu  20261.95   8734.270  2.319822 2.054119e-02
## year2021:monthJun:dayThu  30431.40   8851.514  3.437988 6.089962e-04
## year2022:monthJun:dayThu  30472.90   8734.270  3.488889 5.051059e-04
## year2023:monthJun:dayThu  22893.35   8734.270  2.621095 8.891207e-03
## year2021:monthJul:dayThu  26310.30   8615.431  3.053858 2.316007e-03
## year2022:monthJul:dayThu  18554.25   8615.431  2.153607 3.149769e-02
## year2023:monthJul:dayThu  17216.75   8615.431  1.998362 4.593444e-02
## year2022:monthSep:dayThu  23232.20   8734.270  2.659890 7.935455e-03
## year2021:monthNov:dayThu  27693.80   8734.270  3.170706 1.564645e-03
## year2022:monthNov:dayThu  27214.50   8734.270  3.115830 1.883970e-03
## year2021:monthDec:dayThu  20720.25   8615.431  2.405016 1.634365e-02
## year2021:monthApr:dayFri  24500.05   8615.431  2.843741 4.545066e-03
## year2022:monthApr:dayFri  23176.85   8734.270  2.653553 8.084988e-03
## year2023:monthApr:dayFri  20370.85   8734.270  2.332290 1.987336e-02
## year2021:monthMay:dayFri  20349.90   8372.694  2.430508 1.524399e-02
## year2022:monthMay:dayFri  21365.00   8494.929  2.515030 1.205040e-02
## year2022:monthJun:dayFri  26874.15   8851.514  3.036108 2.455523e-03
## year2023:monthJun:dayFri  17485.20   8734.270  2.001907 4.555107e-02
## year2022:monthSep:dayFri  19031.25   8734.270  2.178917 2.955917e-02
## year2021:monthNov:dayFri  23079.60   8615.431  2.678868 7.502379e-03
## year2022:monthNov:dayFri  25628.30   8734.270  2.934224 3.416428e-03
#I use filter in order to give only the significant results

Model Diagonistic

# Plot diagnostic plots for each interaction model to check for any violations of regression assumptions
par(mfrow = c(2, 2))
plot(interaction_model_wfh)
## Warning: not plotting observations with leverage one:
##   268, 269, 270, 271, 272, 572, 707

plot(interaction_model_rule_of_6)
## Warning: not plotting observations with leverage one:
##   251, 572

plot(interaction_model_eat_out)
## Warning: not plotting observations with leverage one:
##   214, 215, 244

Interaction model comparison

# Compare models including time variables with interaction models
anova(interaction_model_wfh, multiple_model_time)
## Analysis of Variance Table
## 
## Model 1: Hires ~ wfh * year * month * day
## Model 2: Hires ~ wfh + rule_of_6_indoors + eat_out_to_help_out + year + 
##     month + day
##   Res.Df        RSS   Df   Sum of Sq      F    Pr(>F)    
## 1   1027 4.0895e+10                                      
## 2   1346 7.6441e+10 -319 -3.5546e+10 2.7983 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(interaction_model_rule_of_6, multiple_model_time)
## Analysis of Variance Table
## 
## Model 1: Hires ~ rule_of_6_indoors * year * month * day
## Model 2: Hires ~ wfh + rule_of_6_indoors + eat_out_to_help_out + year + 
##     month + day
##   Res.Df        RSS   Df   Sum of Sq      F    Pr(>F)    
## 1   1027 4.2227e+10                                      
## 2   1346 7.6441e+10 -319 -3.4214e+10 2.6085 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(interaction_model_eat_out, multiple_model_time)
## Analysis of Variance Table
## 
## Model 1: Hires ~ eat_out_to_help_out * year * month * day
## Model 2: Hires ~ wfh + rule_of_6_indoors + eat_out_to_help_out + year + 
##     month + day
##   Res.Df        RSS   Df  Sum of Sq      F    Pr(>F)    
## 1   1052 4.3381e+10                                     
## 2   1346 7.6441e+10 -294 -3.306e+10 2.7269 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interaction model multicollinearity check

# Check for multicollinearity in the interaction models
vif(interaction_model_wfh, type = "predictor")
## Warning in cor(X): the standard deviation is zero
## GVIFs computed for predictors
##       GVIF  Df GVIF^(1/(2*Df))   Interacts With Other Predictors
## wfh      1 671               1 year, month, day             --  
## year     1 671               1  wfh, month, day             --  
## month    1 671               1   wfh, year, day             --  
## day      1 671               1 wfh, year, month             --
vif(interaction_model_rule_of_6, type = "predictor")
## Warning in cor(X): the standard deviation is zero
## GVIFs computed for predictors
##                   GVIF  Df GVIF^(1/(2*Df))                 Interacts With
## rule_of_6_indoors    1 671               1               year, month, day
## year                 1 671               1  rule_of_6_indoors, month, day
## month                1 671               1   rule_of_6_indoors, year, day
## day                  1 671               1 rule_of_6_indoors, year, month
##                   Other Predictors
## rule_of_6_indoors             --  
## year                          --  
## month                         --  
## day                           --
vif(interaction_model_eat_out, type = "predictor")
## Warning in cor(X): the standard deviation is zero
## GVIFs computed for predictors
##                     GVIF  Df GVIF^(1/(2*Df))                   Interacts With
## eat_out_to_help_out    1 671               1                 year, month, day
## year                   1 671               1  eat_out_to_help_out, month, day
## month                  1 671               1   eat_out_to_help_out, year, day
## day                    1 671               1 eat_out_to_help_out, year, month
##                     Other Predictors
## eat_out_to_help_out             --  
## year                            --  
## month                           --  
## day                             --

Interaction model estimated marginal means

# Estimate marginal means for the interaction models
emm_wfh <- emmeans(interaction_model_wfh, ~ wfh | year | month | day)
emm_rule_of_6 <- emmeans(interaction_model_rule_of_6, ~ rule_of_6_indoors | year | month | day)
emm_eat_out <- emmeans(interaction_model_eat_out, ~ eat_out_to_help_out | year | month | day)

process_emmeans <- function(emm) {
  emm_summary <- summary(emm)
  emm_filtered <- emm_summary %>%
    group_by(year) %>%
    filter(!(is.na(lower.CL) & is.na(upper.CL))) %>%
    ungroup()
  return(emm_filtered)
}
# Apply the filter to each emmeans result
filtered_emm_wfh <- process_emmeans(emm_wfh)
filtered_emm_rule_of_6 <- process_emmeans(emm_rule_of_6)
filtered_emm_eat_out <- process_emmeans(emm_eat_out)

# Combine and display the filtered results
filtered_results <- list(WFH = filtered_emm_wfh, Rule_of_6 = filtered_emm_rule_of_6, Eat_Out = filtered_emm_eat_out)
filtered_results
## $WFH
## # A tibble: 343 × 9
##      wfh year  month day   emmean    SE    df lower.CL upper.CL
##    <dbl> <fct> <fct> <fct>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
##  1     0 2020  Jan   Sun   15064. 3155.  1027    8873.   21255.
##  2     1 2021  Jan   Sun   15138. 2822.  1027    9600.   20675.
##  3     1 2022  Jan   Sun   22780. 2822.  1027   17242.   28317.
##  4     1 2023  Jan   Sun   12607. 2822.  1027    7070.   18145.
##  5     0 2020  Feb   Sun   10823. 3155.  1027    4632.   17015.
##  6     1 2021  Feb   Sun   21346. 3155.  1027   15154.   27537.
##  7     1 2022  Feb   Sun   19858. 3155.  1027   13667.   26049.
##  8     1 2023  Feb   Sun   16806. 3155.  1027   10615.   22997.
##  9     0 2020  Mar   Sun   15005. 3643.  1027    7856.   22154.
## 10     1 2020  Mar   Sun   11789. 4462.  1027    3033.   20545.
## # ℹ 333 more rows
## 
## $Rule_of_6
## # A tibble: 343 × 9
##    rule_of_6_indoors year  month day   emmean    SE    df lower.CL upper.CL
##                <dbl> <fct> <fct> <fct>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
##  1                 0 2020  Jan   Sun   15064. 3206.  1027    8773.   21355.
##  2                 0 2021  Jan   Sun   15138. 2868.  1027    9510.   20765.
##  3                 0 2022  Jan   Sun   22780. 2868.  1027   17153.   28407.
##  4                 0 2023  Jan   Sun   12607. 2868.  1027    6980.   18235.
##  5                 0 2020  Feb   Sun   10823. 3206.  1027    4532.   17115.
##  6                 0 2021  Feb   Sun   21346. 3206.  1027   15054.   27637.
##  7                 0 2022  Feb   Sun   19858. 3206.  1027   13567.   26149.
##  8                 0 2023  Feb   Sun   16806. 3206.  1027   10515.   23097.
##  9                 0 2020  Mar   Sun   13719. 2868.  1027    8092.   19346.
## 10                 0 2021  Mar   Sun   26913. 3206.  1027   20622.   33205.
## # ℹ 333 more rows
## 
## $Eat_Out
## # A tibble: 318 × 9
##    eat_out_to_help_out year  month day   emmean    SE    df lower.CL upper.CL
##                  <dbl> <fct> <fct> <fct>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
##  1                   0 2020  Jan   Sun   15064. 3211.  1052    8764.   21364.
##  2                   0 2021  Jan   Sun   15138. 2872.  1052    9502.   20773.
##  3                   0 2022  Jan   Sun   22780. 2872.  1052   17145.   28415.
##  4                   0 2023  Jan   Sun   12607. 2872.  1052    6972.   18243.
##  5                   0 2020  Feb   Sun   10823. 3211.  1052    4523.   17124.
##  6                   0 2021  Feb   Sun   21346. 3211.  1052   15045.   27646.
##  7                   0 2022  Feb   Sun   19858. 3211.  1052   13558.   26158.
##  8                   0 2023  Feb   Sun   16806. 3211.  1052   10506.   23106.
##  9                   0 2020  Mar   Sun   13719. 2872.  1052    8084.   19354.
## 10                   0 2021  Mar   Sun   26913. 3211.  1052   20613.   33214.
## # ℹ 308 more rows

Plotting Interaction Emmeans

filtered_emm_wfh$variable <- "WFH"
filtered_emm_rule_of_6$variable <- "Rule of 6 Indoors"
filtered_emm_eat_out$variable <- "Eat Out to Help Out"

# Standardize column names to use rbind
standardize_colnames <- function(df, standard_names) {
  colnames(df) <- standard_names
  return(df)
}
standard_names <- colnames(filtered_emm_wfh)

filtered_emm_wfh <- standardize_colnames(filtered_emm_wfh, standard_names)
filtered_emm_rule_of_6 <- standardize_colnames(filtered_emm_rule_of_6, standard_names)
filtered_emm_eat_out <- standardize_colnames(filtered_emm_eat_out, standard_names)
combined_filtered_emm <- rbind(filtered_emm_wfh, filtered_emm_rule_of_6, filtered_emm_eat_out)

#Plotting emmans
plot_emmeans <- function(emm_filtered, title) {
  ggplot(emm_filtered, aes(x = year, y = emmean, group = interaction(year, month, day), color = factor(month))) +
    geom_line() +
    geom_point() +
    facet_wrap(~ variable, scales = "free") +
    labs(title = title, x = "Year", y = "Estimated Marginal Mean") +
    theme_minimal() +
    theme(legend.position = "bottom") +
    guides(color = guide_legend(title = "Month"))
}

plot_combined_emm <- plot_emmeans(combined_filtered_emm, "EMMeans for Interaction Models")
print(plot_combined_emm)

Analysis

This analysis focus on the effect of COVID-19 policy towards bike hires in London. on the data exploration stage, it was found that there was no NA or missing data. Also, by using histogram, it was found that the data is normally distributed.

I also check the correlation between variables that considered in the analysis of bike hiring trends during the COVID-19 pandemic in London. It was found that WFH shows a weak positive correlation with all the rest of variable. On he other hand, The rule of 6 indoors show negative correlations with the coefficients around -0.09 to -0.15. although these are weak and the negative sign suggests that as one set of restrictions was in place, the likelihood of the rule of 6 effect was slightly lower. The Eat Out to Help Out scheme shows very weak negative correlations with coefficients around -0.05 to -0.08 to other variables. For the time variables, The day of the week has no significant correlation with the restrictions. The month shows negative correlations with coefficients around -0.19 to -0.31, which might indicate a seasonal pattern to the restrictions, while the year shows negative correlations with the closures and a positive correlation with WFH, indicating a possible change in restrictions policies over the years.

ANALYSIS 1

Analysis 1 emphasize more on the impact of bike hire trends with the covid 19 policy. Simple regression used to see the impact on bike hires on each policy. It was found that WFH presents a significant deterrent to bike hires, with a coefficient of -4289.7 (p < 2e-16), indicating that an increase in working from home is associated with a decrease in bike hires. On the other hand, The ‘rule of six’ indoors policy, on the other hand, appears to encourage bike use, showing a positive effect with a coefficient of 7412 (p < 2e-16). Similarly, the ‘eat out to help out’ scheme is also positively correlated with bike hires with a coefficient of 7738.2 (p = 0.000102).

In this part I also want to compared when these variables (WFH, rule of six, and eat out to help out) were assesed together and using time factor as other predictor.The combined effects remain significant, with working from home having a coefficient of -4208.7 (p < 2e-16), the ‘rule of six’ showing a positive coefficient of 8054.2 (p < 2e-16), and the ‘eat out to help out’ resulting in a coefficient of 4883.4 (p = 0.0154). The variance inflation factors for this model without time variables are low, indicating that there is a minimal multi collinearity.

on the other hand, when the time include, the Multiple R-squared increase to 0.4799. it means that nearly 48% of the variability in bike hires can be explained when time factors are considered. Moreover, it was found that year2022 have a significant positive coefficient of 5291.9 (p < 2e-16), it explain that there was an increase in bike hires in this year compared to the baseline year. The month coefficients are also highly significant, with May and June showing the largest positive effects on bike hires, 14409.0 (p < 2e-16) and 18811.1 (p < 2e-16), respectively. I also found that Days of the week were less significant, though Tuesday to Saturday all had positive coefficients, with Saturday (daySat) having the highest at 3679.6 (p < 2e-16).

The ANOVA results comparing the model without time variables to the model with them yields a stark difference in the residual sum of squares (RSS), dropping from 1.3914e+11 to 7.6441e+10, with an F-statistic of 55.199 and p-value of less than 2.2e-16. It means that time variables play a crucial role in modeling bike hire patterns

ANALYSIS 2

This part focus more on the interaction between COVID-19 policies, time variables, and bike hires. The wfh interaction model shows a significant base effect with an intercept of 15064.00 (p = 2.06e-06). The wfh policy itself shows a positive coefficient of 13319.30 (p = 0.0305). In the interaction model, it was found that August (monthAug) has a negative coefficient of -15506.65 (p = 0.0097). it means that there was a decrease in bike hires during this month when more people work from home.

In the rule of six interaction model, April (monthApr) shows a significant positive effect of 19324.50 (p = 2.21e-05). However, the interaction between the rule of six indoors policy in 2021 (rule_of_6_indoors:year2021) and June (monthJun) mentioned that there is negative coefficient of -30333.27 (p = 0.0018). it means that there was a substantial reduction in bike hires under these combined conditions.

The eat out to help out interaction model presents a similar pattern with significant positive coefficients for months like May (29109.00, p < 2.2e-11) and June (31047.75, p < 2.2e-11). In the interaction term, April 2023 (year2023:monthApr) results in a negative coefficient of -13439.50 (p = 0.0276).

plot(interaction_model_wfh)
## Warning: not plotting observations with leverage one:
##   268, 269, 270, 271, 272, 572, 707

plot(interaction_model_rule_of_6)
## Warning: not plotting observations with leverage one:
##   251, 572

plot(interaction_model_eat_out)
## Warning: not plotting observations with leverage one:
##   214, 215, 244

Next part is diagonistic plot. It shows the interaction models when assessing the impact of work from home (WFH), rule of 6, and eat out policy on bike hires provide insights into the validity of regression assumptions. overall, these plots mentions that while the models capture a significant amount of variance, there may be concerns regarding outliers, leverage points, and assumptions of linearity and equal variance that could affect the model’s reliability.

In the VIF analysis on the interaction models reveals that multicollinearity is not a concern for any of the predictors. All the predictors have a VIF value of 1. It mentions that there was no inflation and therefore no multicollinearity. Moreover, The degrees of freedom for all predictors is 671.

plot_combined_emm

for the emmeans analysis for interaction, i use plot to make easier to read and interpret. The variation in dots across months indicates changes in the response variable’s means due to these three COVID-19 policy’s interactions with time. There’s a notable spread in the means for each month, suggesting a potential seasonal pattern or policy effect variation over time. Moreover the plot is overlapping, it explains that the estimated means for different months are close to each other within the same year, yet there appears to be a discernible trend or shift in the means from year to year.

In conclusion, The inclusion of time variables significantly improves model fit. It means that the importance of considering temporal factors. Interaction models further reveal the complex interaction between COVID- 19 policies and time, with significant variations observed across different months and years, underscoring the nuanced impact of these policies on bike hiring behaviors.


Question 2

Master Data

# Load the publisher sales data
book_sales <- read.csv("publisher_sales.csv", stringsAsFactors = TRUE)

Data Dictionary

Variable Description
sold.by The book’s publisher name
publisher.type The type of publisher
genre The genre of the book
avg.review The average of review scores given to the book
daily.sales The number of book sold
total.reviews The number of people who give their reviews to the book
sale.price The price of the book

Data Exploration

Checking the Data Structure and Summary

# Data structure and summary
str(book_sales)
## 'data.frame':    6000 obs. of  7 variables:
##  $ sold.by       : Factor w/ 13 levels "Amazon Digital Services,  Inc.",..: 1 6 1 1 1 1 11 13 1 13 ...
##  $ publisher.type: Factor w/ 5 levels "amazon","big five",..: 3 2 5 5 5 5 2 2 5 2 ...
##  $ genre         : Factor w/ 3 levels "adult_fiction",..: 1 3 1 3 3 1 2 2 2 2 ...
##  $ avg.review    : num  4.5 4.64 2.5 4.5 4.98 3.98 4.62 3.5 4.64 4.56 ...
##  $ daily.sales   : num  84.4 113.1 70.8 149.4 135.7 ...
##  $ total.reviews : int  151 184 125 225 194 123 130 110 129 119 ...
##  $ sale.price    : num  5.12 6.91 6.27 4.91 7.39 ...
summary(book_sales)
##                                  sold.by           publisher.type
##  Amazon Digital Services,  Inc.      :4271   amazon       :  92  
##  Random House LLC                    : 486   big five     :1688  
##  Penguin Group (USA) LLC             : 346   indie        :1243  
##  HarperCollins Publishers            : 274   single author: 789  
##  Simon and Schuster Digital Sales Inc: 241   small/medium :2188  
##  Macmillan                           : 172                       
##  (Other)                             : 210                       
##            genre        avg.review     daily.sales     total.reviews  
##  adult_fiction:2000   Min.   :0.000   Min.   : -0.53   Min.   :  0.0  
##  non_fiction  :2000   1st Qu.:4.090   1st Qu.: 64.22   1st Qu.:104.0  
##  YA_fiction   :2000   Median :4.410   Median : 81.59   Median :128.0  
##                       Mean   :4.269   Mean   : 86.37   Mean   :132.7  
##                       3rd Qu.:4.610   3rd Qu.:104.73   3rd Qu.:163.0  
##                       Max.   :4.980   Max.   :209.34   Max.   :243.0  
##                                                                       
##    sale.price   
##  Min.   : 1.17  
##  1st Qu.: 7.34  
##  Median : 9.31  
##  Mean   :10.30  
##  3rd Qu.:13.58  
##  Max.   :22.00  
## 

Checking Data Quality & Cleansing Data

# Check for missing values
sum(is.na(book_sales))
## [1] 0

### Data Visualization 
#### Histogram

```r
# Histograms for various variables
p1 <- ggplot(book_sales, aes(x = daily.sales)) + 
  geom_histogram(binwidth = 1, fill = "blue") + 
  labs(title = "Histogram of Daily Sales")

p2 <- ggplot(book_sales, aes(x = sale.price)) + 
  geom_histogram(binwidth = 1, fill = "darkgreen") + 
  labs(title = "Histogram of Sale Price")

p3 <- ggplot(book_sales, aes(x = avg.review)) + 
  geom_histogram(binwidth = 0.1, fill = "red") + 
  labs(title = "Histogram of Average Review Scores")

p4 <- ggplot(book_sales, aes(x = total.reviews)) + 
  geom_histogram(binwidth = 10, fill = "orange") + 
  labs(title = "Histogram of Total Number of Reviews")

# Combine plots into a grid
grid.arrange(p1, p2, p3, p4, ncol = 2)

Boxplot

# Boxplot for daily sales
daily_sales_boxplot <- ggplot(book_sales, aes(y = daily.sales)) +
  geom_boxplot(fill = 'blue') +
  labs(x = "", y = "Daily Sales",
       title = "Boxplot of Daily Sales") +
  theme_classic()

# Boxplot for sales price
sales_price_boxplot <- ggplot(book_sales, aes(y = sale.price)) +
  geom_boxplot(fill = 'darkgreen') +
  labs(x = "", y = "Sale Price",
       title = "Boxplot of Sale Price") +
  theme_classic()

# Boxplot for average review
avg_review_boxplot <- ggplot(book_sales, aes(y = avg.review)) +
  geom_boxplot(fill = 'red') +
  labs(x = "", y = "Average Review",
       title = "Boxplot of Average Review Scores") +
  theme_classic()

# Boxplot for total reviews
total_review_boxplot <- ggplot(book_sales, aes(y = total.reviews)) +
  geom_boxplot(fill = 'orange') +
  labs(x = "", y = "Total Reviews",
       title = "Boxplot of Total Number of Reviews") +
  theme_classic()

# Arrange the boxplots into a 2x2 grid
grid.arrange(daily_sales_boxplot, sales_price_boxplot,
             avg_review_boxplot, total_review_boxplot,
             ncol = 2, nrow = 2)

# Identify rows with avg.review or total.reviews equal to 0
zero_review_rows <- book_sales$avg.review == 0 & book_sales$total.reviews == 0

# Display rows with zero reviews
print(book_sales[zero_review_rows, ])
##                                   sold.by publisher.type         genre
## 476        Amazon Digital Services,  Inc.          indie   non_fiction
## 599        Amazon Digital Services,  Inc.  single author    YA_fiction
## 668               Penguin Group (USA) LLC       big five   non_fiction
## 756        Amazon Digital Services,  Inc.          indie    YA_fiction
## 1137       Amazon Digital Services,  Inc.          indie adult_fiction
## 1397       Amazon Digital Services,  Inc.          indie   non_fiction
## 1449                     Random House LLC       big five adult_fiction
## 1742       Amazon Digital Services,  Inc.   small/medium    YA_fiction
## 2523       Amazon Digital Services,  Inc.   small/medium adult_fiction
## 2734             HarperCollins Publishers       big five adult_fiction
## 2760       Amazon Digital Services,  Inc.          indie    YA_fiction
## 2860       Amazon Digital Services,  Inc.   small/medium adult_fiction
## 2967       Amazon Digital Services,  Inc.   small/medium    YA_fiction
## 3358       Amazon Digital Services,  Inc.   small/medium adult_fiction
## 3397       Amazon Digital Services,  Inc.          indie   non_fiction
## 3839       Amazon Digital Services,  Inc.   small/medium    YA_fiction
## 3945              Penguin Group (USA) LLC       big five   non_fiction
## 4152       Amazon Digital Services,  Inc.          indie   non_fiction
## 4392       Amazon Digital Services,  Inc.         amazon adult_fiction
## 4509       Amazon Digital Services,  Inc.          indie    YA_fiction
## 4543       Amazon Digital Services,  Inc.          indie    YA_fiction
## 4779             HarperCollins Publishers       big five   non_fiction
## 5413 Simon and Schuster Digital Sales Inc       big five    YA_fiction
##      avg.review daily.sales total.reviews sale.price
## 476           0       78.27             0      11.66
## 599           0      125.48             0       7.93
## 668           0       58.06             0      16.38
## 756           0      143.41             0       5.15
## 1137          0       68.63             0       9.92
## 1397          0       65.97             0      14.47
## 1449          0      114.20             0       9.89
## 1742          0       93.56             0       9.48
## 2523          0       63.08             0       7.83
## 2734          0       63.67             0      12.10
## 2760          0      163.70             0       7.25
## 2860          0       88.89             0       5.57
## 2967          0      103.08             0       8.34
## 3358          0       77.08             0       4.96
## 3397          0       74.70             0      14.35
## 3839          0       86.03             0      11.54
## 3945          0       78.21             0      18.39
## 4152          0       64.83             0      11.92
## 4392          0       81.49             0       5.11
## 4509          0      155.53             0       9.27
## 4543          0       92.15             0       8.10
## 4779          0       65.53             0      14.95
## 5413          0      103.13             0       8.82

Analysis 1: The Effect of Total Review & Average Review on Book Sales

Correlation Analysis

rcorr(as.matrix(select(book_sales, daily.sales, total.reviews, avg.review)), type = "spearman")
##               daily.sales total.reviews avg.review
## daily.sales          1.00          0.68      -0.01
## total.reviews        0.68          1.00       0.02
## avg.review          -0.01          0.02       1.00
## 
## n= 6000 
## 
## 
## P
##               daily.sales total.reviews avg.review
## daily.sales               0.0000        0.5597    
## total.reviews 0.0000                    0.1777    
## avg.review    0.5597      0.1777

Regression analysis

# Linear regression to predict sales based on reviews and price
lm_sales_reviews <- lm(daily.sales ~ avg.review + total.reviews, data = book_sales)
summary(lm_sales_reviews)
## 
## Call:
## lm(formula = daily.sales ~ avg.review + total.reviews, data = book_sales)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -101.573  -14.529   -0.909   13.669  129.721 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   33.978718   2.357956  14.410   <2e-16 ***
## avg.review    -4.306466   0.514881  -8.364   <2e-16 ***
## total.reviews  0.533428   0.007749  68.836   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.59 on 5997 degrees of freedom
## Multiple R-squared:  0.4416, Adjusted R-squared:  0.4414 
## F-statistic:  2371 on 2 and 5997 DF,  p-value: < 2.2e-16

Regression Model With Emmeans Analysis

# Regression model for daily sales by total reviews
m.sales.by.total.review <- lm(daily.sales ~ total.reviews, data = book_sales)
summary(m.sales.by.total.review)
## 
## Call:
## lm(formula = daily.sales ~ total.reviews, data = book_sales)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -102.667  -14.694   -1.125   13.458  147.319 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   16.381371   1.070717   15.30   <2e-16 ***
## total.reviews  0.527490   0.007761   67.97   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.72 on 5998 degrees of freedom
## Multiple R-squared:  0.4351, Adjusted R-squared:  0.435 
## F-statistic:  4620 on 1 and 5998 DF,  p-value: < 2.2e-16
cbind(coef(m.sales.by.total.review), confint(m.sales.by.total.review))
##                              2.5 %     97.5 %
## (Intercept)   16.3813714 14.282382 18.4803612
## total.reviews  0.5274902  0.512276  0.5427044
m.sales.by.total.review.emm <- emmeans(m.sales.by.total.review, ~ total.reviews)

# Regression model for daily sales by average review
m.sales.by.avg.review <- lm(daily.sales ~ avg.review, data = book_sales)
summary(m.sales.by.avg.review)
## 
## Call:
## lm(formula = daily.sales ~ avg.review, data = book_sales)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -86.721 -22.129  -4.722  18.267 123.403 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  90.8944     2.9543  30.767   <2e-16 ***
## avg.review   -1.0593     0.6859  -1.544    0.123    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.22 on 5998 degrees of freedom
## Multiple R-squared:  0.0003974,  Adjusted R-squared:  0.0002308 
## F-statistic: 2.385 on 1 and 5998 DF,  p-value: 0.1226
cbind(coef(m.sales.by.avg.review), confint(m.sales.by.avg.review))
##                           2.5 %     97.5 %
## (Intercept) 90.894430 85.102983 96.6858777
## avg.review  -1.059283 -2.403959  0.2853935
m.sales.by.avg.review.emm <- emmeans(m.sales.by.avg.review, ~ avg.review)

# Multiple regression for daily sales using total reviews and average reviews
m.sales.by.total.avg.review <- lm(daily.sales ~ total.reviews + avg.review, data = book_sales)
summary(m.sales.by.total.avg.review)
## 
## Call:
## lm(formula = daily.sales ~ total.reviews + avg.review, data = book_sales)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -101.573  -14.529   -0.909   13.669  129.721 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   33.978718   2.357956  14.410   <2e-16 ***
## total.reviews  0.533428   0.007749  68.836   <2e-16 ***
## avg.review    -4.306466   0.514881  -8.364   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.59 on 5997 degrees of freedom
## Multiple R-squared:  0.4416, Adjusted R-squared:  0.4414 
## F-statistic:  2371 on 2 and 5997 DF,  p-value: < 2.2e-16
cbind(coef(m.sales.by.total.avg.review), confint(m.sales.by.total.avg.review))
##                               2.5 %     97.5 %
## (Intercept)   33.9787175 29.3562756 38.6011594
## total.reviews  0.5334285  0.5182371  0.5486198
## avg.review    -4.3064660 -5.3158175 -3.2971145
m.sales.by.total.avg.review.emm <- emmeans(m.sales.by.total.avg.review, ~ total.reviews + avg.review)

# Interaction effect between total reviews and average reviews
m.sales.by.total.avg.itr.review <- lm(daily.sales ~ total.reviews * avg.review, data = book_sales)
summary(m.sales.by.total.avg.itr.review)
## 
## Call:
## lm(formula = daily.sales ~ total.reviews * avg.review, data = book_sales)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -104.318  -14.401   -0.873   13.622   94.395 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               76.107846   4.169256  18.255  < 2e-16 ***
## total.reviews              0.139838   0.033199   4.212 2.57e-05 ***
## avg.review               -14.673445   0.991327 -14.802  < 2e-16 ***
## total.reviews:avg.review   0.095620   0.007848  12.184  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.32 on 5996 degrees of freedom
## Multiple R-squared:  0.4551, Adjusted R-squared:  0.4548 
## F-statistic:  1669 on 3 and 5996 DF,  p-value: < 2.2e-16
cbind(coef(m.sales.by.total.avg.itr.review), confint(m.sales.by.total.avg.itr.review))
##                                              2.5 %      97.5 %
## (Intercept)               76.10784576  67.93460422  84.2810873
## total.reviews              0.13983789   0.07475610   0.2049197
## avg.review               -14.67344515 -16.61680183 -12.7300885
## total.reviews:avg.review   0.09561996   0.08023496   0.1110050
m.sales.by.total.avg.itr.review.emm <- emmeans(m.sales.by.total.avg.itr.review, ~ total.reviews + avg.review)
anova(m.sales.by.total.avg.review, m.sales.by.total.avg.itr.review)
## Analysis of Variance Table
## 
## Model 1: daily.sales ~ total.reviews + avg.review
## Model 2: daily.sales ~ total.reviews * avg.review
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1   5997 3059870                                  
## 2   5996 2985945  1     73925 148.45 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova Comparison

# ANOVA Comparison
anova_comparison_split <- anova(m.sales.by.total.review, m.sales.by.avg.review)
print(anova_comparison_split)
## Analysis of Variance Table
## 
## Model 1: daily.sales ~ total.reviews
## Model 2: daily.sales ~ avg.review
##   Res.Df     RSS Df Sum of Sq F Pr(>F)
## 1   5998 3095564                      
## 2   5998 5477558  0  -2381994
#GAK SIGNIFIKAN KALAU BERDUA DOESNT MATTER KALAUPUN

anova_comparison_combined <- anova(m.sales.by.total.avg.review, m.sales.by.total.review)
print(anova_comparison_combined)
## Analysis of Variance Table
## 
## Model 1: daily.sales ~ total.reviews + avg.review
## Model 2: daily.sales ~ total.reviews
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1   5997 3059870                                  
## 2   5998 3095564 -1    -35694 69.957 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova_interaction_comparison <- anova(m.sales.by.total.avg.itr.review, m.sales.by.total.avg.review)
print(anova_interaction_comparison)
## Analysis of Variance Table
## 
## Model 1: daily.sales ~ total.reviews * avg.review
## Model 2: daily.sales ~ total.reviews + avg.review
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1   5996 2985945                                  
## 2   5997 3059870 -1    -73925 148.45 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis 2: The Effect of Total Review & Average Review on Book Sales include Genre

Linear regression model with genre interaction

# Linear regression model to study the effect of sale price on daily sales across genres
lm_sales_price_by_genre <- lm(daily.sales ~ sale.price * genre, data = book_sales)
summary(lm_sales_price_by_genre)
## 
## Call:
## lm(formula = daily.sales ~ sale.price * genre, data = book_sales)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -115.35  -13.69    0.19   13.54   96.32 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  88.2127     2.0456  43.123  < 2e-16 ***
## sale.price                   -0.7104     0.2492  -2.851  0.00437 ** 
## genrenon_fiction            -23.6300     4.1904  -5.639 1.79e-08 ***
## genreYA_fiction              52.9727     2.8720  18.444  < 2e-16 ***
## sale.price:genrenon_fiction   0.6383     0.3474   1.838  0.06617 .  
## sale.price:genreYA_fiction   -2.8284     0.3502  -8.077 7.99e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.91 on 5994 degrees of freedom
## Multiple R-squared:  0.4751, Adjusted R-squared:  0.4746 
## F-statistic:  1085 on 5 and 5994 DF,  p-value: < 2.2e-16
# Linear regression to predict sales based on sale price and interaction with genre
lm_sales_price_genre <- lm(daily.sales ~ sale.price * genre, data = book_sales)
summary(lm_sales_price_genre)
## 
## Call:
## lm(formula = daily.sales ~ sale.price * genre, data = book_sales)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -115.35  -13.69    0.19   13.54   96.32 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  88.2127     2.0456  43.123  < 2e-16 ***
## sale.price                   -0.7104     0.2492  -2.851  0.00437 ** 
## genrenon_fiction            -23.6300     4.1904  -5.639 1.79e-08 ***
## genreYA_fiction              52.9727     2.8720  18.444  < 2e-16 ***
## sale.price:genrenon_fiction   0.6383     0.3474   1.838  0.06617 .  
## sale.price:genreYA_fiction   -2.8284     0.3502  -8.077 7.99e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.91 on 5994 degrees of freedom
## Multiple R-squared:  0.4751, Adjusted R-squared:  0.4746 
## F-statistic:  1085 on 5 and 5994 DF,  p-value: < 2.2e-16

Plotting results

# Scatter plot with regression line for reviews
scatter_1<- ggplot(book_sales, aes(x = total.reviews, y = daily.sales)) +
  geom_point(aes(color = genre), alpha = 0.5) +
  geom_smooth(method = "lm") +
  labs(title = "Daily Sales vs Total Reviews by Genre")

# Scatter plot with regression line for sale price
scatter_2 <- ggplot(book_sales, aes(x = sale.price, y = daily.sales)) +
  geom_point(aes(color = genre), alpha = 0.5) +
  geom_smooth(method = "lm") +
  labs(title = "Daily Sales vs Sale Price by Genre")

ANOVA Analysis

# ANOVA to compare models with and without the interaction term
lm_sales_price <- lm(daily.sales ~ sale.price, data = book_sales)
anova_sales_price_genre <- anova(lm_sales_price, lm_sales_price_by_genre)
print(anova_sales_price_genre)
## Analysis of Variance Table
## 
## Model 1: daily.sales ~ sale.price
## Model 2: daily.sales ~ sale.price * genre
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1   5998 4065261                                  
## 2   5994 2876489  4   1188772 619.29 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Emmeans Analysis

# emmeans to estimate the effect of sale price within each genre
emm_genre_effect <- emmeans(lm_sales_price_by_genre, ~ sale.price | genre)
summary(emm_genre_effect)
## genre = adult_fiction:
##  sale.price emmean    SE   df lower.CL upper.CL
##        10.3   80.9 0.759 5994     79.4     82.4
## 
## genre = non_fiction:
##  sale.price emmean    SE   df lower.CL upper.CL
##        10.3   63.8 1.233 5994     61.4     66.3
## 
## genre = YA_fiction:
##  sale.price emmean    SE   df lower.CL upper.CL
##        10.3  104.7 0.758 5994    103.3    106.2
## 
## Confidence level used: 0.95

Analysis

This analysis talks about e-book sales data from a publishing company offers comprehensive insights into the dynamics of book sales. On the Data Exploration stage, it was found that there was no missing values on the data. After that, I checked the data using histogram and box plot. It was found that the data had ‘0’ average review scores indicating these books hadn’t received any reviews yet.

I used spearman correlation test to see the correlation between each variables. It was found that there is a positive correlation of 0.68 was found between total reviews and daily sales. It indicates that books with more reviews tend to have higher daily sales. On the other hand, there is a negative correlation of -0.01 between average review scores and daily sales that indicates the impact of review quality on sales was minimal.

To see the interaction between variables I the regression analysis stage , I conduct several models to gain some insights:

  1. Influence of Total Reviews on Sales This model focusing on total reviews (R² = 0.4351) established a significant positive impact on daily sales. Each additional review was associated with an approximate increase of 0.53 units in daily sales (t(5998) = 67.97, p < 2e-16, 95% CI [0.51, 0.54]).

  2. Role of Average Review Scores in Sales This model evaluated the impact of average review scores on sales. However, it shows that relationship was statistically insignificant (p = 0.123), with a coefficient of -1.0593 (95% CI [-2.40, 0.29]). It indicates a slightly decrease in sales with higher review scores.

  3. Combined Effects of Total and Average Reviews This model combined both total and average reviews into a single model (R² = 0.4416) revealed significant effects on sales. Total reviews turns out to be positive predictor (β = 0.5334, 95% CI [0.518, 0.549]), on the other hand average reviews had a negative influence (β = -4.3065, 95% CI [-5.31, -3.29]).

  4. Interaction Between Total and Average Reviews: An interaction model (R² = 0.4551) further elucidated the complex dynamics between total and average reviews on sales. The significant interaction term (β = 0.0956, p < 2e-16, 95% CI [0.080, 0.111]) indicated that the impact of average reviews on sales not really depends on the number of total reviews.

For depeer understanding, genre also involved in this analysis. This publisher data divided their books into three genres. which are adult fiction, non fiction, and young adult fiction. The linear regression analysis that each genre reacts differently to pricing strategies. Non-fiction genres tend to sell less as the price increases, as indicated by a negative coefficient of -23.6300 and a significant p-value. On the other hand, young adult (YA) fiction shows a positive relationship with sales price, with a coefficient of 52.9727, it means that there is an increase in sales with higher prices.

scatter_1
## `geom_smooth()` using formula = 'y ~ x'

scatter_2
## `geom_smooth()` using formula = 'y ~ x'

To visualize the results I also use scatter plots with regression lines for the total number of reviews and daily sales by genre which reveals a positive trend. It indicating that as the number of reviews increases, so do the daily sales. This pattern holds true across different genres, it explains that that customer engagement through reviews is a consistent predictor of sales performance. The first plot illustrates that for young adult fiction (YA_fiction), there is a noticeable decline in sales as the price increases. The second plot, showing sales versus total reviews by genre, demonstrates a robust positive trend across all genres, indicating that irrespective of genre, books with more reviews tend to have higher sales.

In addition to regression in genre, the ANOVA and Emmeans results was also conducted. The ANOVA results are highly significant (p < 2.2e-16), illustrate that the interaction between sale price and genre plays a crucial role in daily sales. The interaction between sale price and genre has a significantly lower RSS than the model with sale price alone , indicating a better fit for the data when genre is considered. On the other hand, (EMM) analysis provides additional insights into how the sale price impacts sales within each genre. For adult fiction, the estimated mean daily sales at a sale price of 10.3 is 80.9. For non-fiction, the estimated mean is lower at 63.8. Also, for young adult fiction, the estimated mean is the highest at 104.7. This suggests that the impact of sale price on sales is depend on genre, with young adult fiction being less sensitive to price changes compared to adult fiction and non-fiction.

In conclusion, the findings suggest that the total reviews are a strong predictor of book sales. on the other hand average review scores do not have a significant direct impact when considered in isolation. The price of a book does affect its sales, and this effect is moderated by the book’s genre.